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TURBO-CODE DECODER 
BACKGROUND OF THE INVENTION 

5 Field of Invention 

[0001] The present invention generally relates to a decoder, and more particularly, to a 
fast turbo-code decoder. The decoder is designed to use the systolic array very large 
scaled integrated (VLSI) circuits; the output of previous level can be used as the input of 
next level. Thus, the advantages of the parallel and the pipeline calculation are totally 

10 achieved. The decoding speed has improved manifestly comparing to the calculation 
time of the conventional decoder. The speed has about 5*(N+M) times faster than the 
conventional decoder, wherein, N stands for the block length, and M stands for register 
size. 

Description of Related Art 

15 [0002] The error control coding is widely used in the communication system and the 
computer media storage. Berrou, Glavieux and Thitimajshima first proposed the turbo- 
code whose error-correcting capability nears to the Shannon limited error-correcting in 
1993 (C. Berrou, A. Glavieux, and P. Thitimajshima, 'TSIear Shannon Limited Error- 
correcting Coding and Decoding: Turbo-codes (1)," in Proc. ICC'93, May, 1993). Since 

20 the excellence of the error-correcting capability, the turbo-code is widely applied in the 
general communication system such as the CDMA transmission system. Whereas, if the 
block length of the conventional decoding algorithm is too small, the error-correcting 
capability is not good, wherein the block length is for transmission. On the other hand, if 
the block length of transmission is too large, for a communication system needs the real 
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time processing, the decoding delay is too large to tolerant. Therefore, it is important to 
solve this problem to fulfill the requirement of the current high-speed communication. 

SUMMARY OF THE INVENTION 
5 [0003] To solve the problem mentioned above and to increase the computing speed and 
thus to increase the throughput. The present invention provides a structure design using 
the parallel and systolic array VLSI. 

[0004] The structure design adopting the parallel and systolic array VLSI mentioned 
above, wherein the decoder is designed to use the systolic array VLSI circuits. Since the 

10 output of previous level can be used as the input of next level. So the advantages of the 
parallel and the pipeline calculation are totally achieved. The latency is only N+M+2 
units of time, the latency is shorten to as about 1/5 comparing to the conventional 
sequential calculation structure that takes 5*(N+M) units of time. The decoding 
throughput is about 5*(N+M) times higher than the conventional decoder. Although the 

15 quantity of the circuit gate is about 5*(N+M) times higher than the conventional decoder. 
However, the VLSI techniques had been progressively improved nowadays, thus the 
hardware complexity is easy to overcome. Devoting the hardware cost to get the higher 
speed will be a changeless trend. 

[0005] In order to achieve the objective mentioned above, the present invention uses a 
20 parallel and systolic array VLSI structure design to provide a turbo-code decoder for the 
communication system. The decoder comprises a serial-to-parallel output unit and a 
plurality of parallel decoding units. Wherein, the serial-to-parallel output unit receives a 
serial input signal, converts it and outputs a parallel signal. The parallel decoding units 
mentioned above are serially connected to form a plurality of levels. The first level 
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parallel decoding unit receives the parallel signal that is output from the serial-to-parallel 
output unit. The output from the first level parallel decoding unit is sent to the second 
level parallel decoding unit, with certain sequence, the parallel signal passes through the 
parallel decoding units for decoding process. 
5 [0006] The turbo-code decoder mentioned above, wherein, each parallel decoding unit 
receives an extrinsic parameter when processing the decoding process, to be the signal 
that is after the decoding process from the parallel decoding unit, and sends the extrinsic 
parameter to the next level of the parallel decoding unit. 

[0007] The turbo-code decoder mentioned above, wherein, the extrinsic parameter is 
10 obtained from a deinterleaving operation. The extrinsic parameter of the first level 
parallel decoding unit is L^^=iO,0. . .,0), where k=l, 2, . . N, N is the block length of the 
turbo-code. 

[0008] The turbo-code decoder mentioned above, wherein, the serial input signals are Ti^^, 
Tipj^, and k messages of the turbo-code, whereas 1^1 , 2, . . . , N, N is the block length of 
15 the turbo-code. 

[0009] The turbo-code decoder mentioned above, wherein, the serial-to-parallel output 
unit receives the ri^k, r,pk, and rjp^c, wherein, the subscript K=0, I, N+M-1 represents 
the whole block and end message. M stands for register size of the turbo-code decoder. 
The serial-to-parallel output unit coverts the received Ti^, r^^^, and T2p^ messages and 
20 outputs the results to the first level parallel decoding unit in parallel. The first level 
parallel decoding unit also receives an extrinsic parameter L^^^ the same time. The Lj, ^ 
is the parameter that is obtained via a deinterleaving operation on the previous level 
extrinsic parameter A(d^) . The initial value of the first level decoding unit extrinsic 
parameter is set as jt=(0,0...,0), a first level extrinsic parameter L,,k is generated 
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via the first level parallel decoding unit. And makes the message v^^j^,, rjp and ijpj, pass 
through sequentially to be the input of next level. 

[0010] The turbo-code decoder mentioned above, wherein, the parallel decoding unit 
comprises a first decoder, a second decoder, an interleaving unit, and a deinterleaving 

5 unit. Wherein, the first decoder receives the r,sk, r,p|; messages and the extrinsic 
parameter L^^. The second decoder receives the r2p k message and the extrinsic parameter 
h^. The interleaving imit is allocated between the first decoder and the second decoder, 
receives the output of the first decoder. The deinterleaving imit is coimected to the 
second decoder, alternately outputs the output of the first decoder and the second decoder. 

10 [0011] The turbo-code decoder mentioned above, wherein, the first decoder of the 
parallel decoding units constitutes a systolic array VLSI circuits structure. 
[0012] The turbo-code decoder mentioned above, wherein, the systolic array VLSI 
circuits is composed of N+M units of the module C, A, B, D, and E. Wherein, the module 
C receives L^^^, Vy^^ and v^j^, and outputs r^^\m) and r^^^im) . Module A calculates a 

15 forward recursive probability parameter a ^ . Module B calculates a backward recursive 
probability parameter j8 ^. Module D adopts (N+M) units of parallel calculation to 
obtain the A(f/j) after the calculation of the a y., y., and y^^^ are finished. Module E 
outputs the value of the calculation fi-om the module D, K=0, 1, . . ., N+M-1 . 
[0013] The turbo-code decoder mentioned above, wherein, the value of the A(dj^) is 

20 calculated according to a MAP algorithm and following equation: 

A(i/J = log " ^ . — 
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[0014] Wherein, a ^ is the forward recursive probability parameter, /S ^ is the backward 
recxirsive probability parameter, yl'^ is a branch probability parameter. 
[0015] The turbo-code decoder mentioned above, wherein, the forward recursive 
probability parameter a ^ is obtained from the calculation of the previous parameter a^., 
and the branch probability parameter /['^ , the equation is as follows: 

a,im) = ^^^-^ 

S Z Zri'\rn\m)-aUm') 

m m' i=0 

[0016] The turbo-code decoder mentioned above, wherein, the backward recursive 
probability parameter 0 ^is obtained from the calculation of the next parameter yff^^., and 

the branch probability parameter yl'^ , the equation is as follows: 

Z Z Z^i?.(^''^)-A..K) 

m m' i=0 

[0017] The turbo-code decoder mentioned above, wherein, the branch probability 
parameter y^^^ is obtained from following equation according to the MAP algorithm: 

q{d^ =i\s^.= m,s^_j = ni') • Vr{s^ = m \ = m'} 

[0018] Wherein whether the probability parameter q(d,^ =115^= m,s^_i = w') is 0 or 1 
depends on the input bit d^ =i is 0 or 1 combines the probability of the state m' to the 
state m. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0019] The accompanying drawings are included to provide a further understanding of 
the invention, and are incorporated in and constitute a part of this specification. The 
drawings illustrate embodiments of the invention, and together with the description, serve 
5 to explain the principles of the invention. In the drawings, 

[0020] FIG. 1 schematically shows a turbo-code encoder comprising of two parallel RSC 
encoders; 

[0021] FIG. 2 schematically shows the decoding structure of the turbo-code; 
[0022] FIG. 3 schematically shows the structure of the P levels parallel decoding unit 
1 0 (Level 1 , Level 2, . . . , Level P); 

[0023] FIG. 4 schematically shows tiie structure of the first level decoding unit of the 
parallel decoding units in FIG. 3; 

[0024] FIG. 5 schematically shows the structure of the systolic array VLSI that is 
composed of the first level decoding unit of the parallel decoding unit in FIG. 4; 
15 [0025] FIG. 6 schematically shows the structure of the simplified modules, data streams, 
and the latches of the parallel decoding units in FIG. 3 when N=4 and M=3; 
[0026] FIG. 7 schematically shows the calculation structure of the branch probability 
parameter yl'\m',m) ; 

[0027] FIG. 8 schematically shows the structure of module A for calculating a 
20 [0028] FIG. 9 schematically shows the structure of module B for calculating /S 

[0029] FIG. 10 schematically shows the structure of module D for calculating A(d^) ; 
[0030] FIG. 1 1 schematically shows the structure of the calculation submodule L (using 
analog circuit); 
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[0031] FIG. 12 schematically shows the structure of the fast RSC encoder, wherein, 
Gb=1011,G<,=1110; 

[0032] FIG. 13 schematically shows the trellis diagram; 

[0033] FIG. 14 schematically shows the detail structure of module A (wherein the 
5 submodule L is designed as the digital circuit); 

[0034] FIG. 15 schematically shows the detail structure of module D; 

[0035] FIG. 16 schematically shows the latency for accomplishing a message having a 

J'^ block size length; and 

fij 

y i [0036] FIG. 17 schematically shows the comparison of Ihe bit error rate, wherein, the 

10 iterative decoding number P=6, code ratio R=l/3, register size M=3, generator parameter 

O Gb=1011, Gd=1110, the 256*256 random interleaving method is adopted by the first 

Q decoder and the second decoder. 

fii 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
15 [0037] The present invention provides a structure design adopting the parallel and 
systolic array VLSI. The structure design adopting the parallel and systolic array VLSI 
mentioned above, wherein the decoder is designed to use the systolic array VLSI circuits. 
Since the output of previous level can be used as tiie input of next level. So the 
advantages of the parallel and the pipeline calculation are totally achieved. The latency is 
20 only N+M+2 units of time, the latency is shorten to as about 1/5 comparing to the 
conventional sequential calculation structure that takes 5*(N+M) units of time. The 
decoding throughput is about 5*(N+M) times higher than the conventional decoder. 
Although the quantity of the circuit gate is about 5*(N+M) times higher than the 
conventional decoder. However, the VLSI techniques had been progressively improved 
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nowadays, thus the hardware complexity is easy to overcome. Devoting the hardware 

cost to get the higher speed will be a changeless trend. 

[0038] Berrou, Glavieux and Thitimajshima first proposed the turbo-code whose error- 
correcting capability nears to the Shannon limited error-correcting in 1993 (C. Berrou, A. 
Glavieux, and P. Thitimajshima, '*Near Shannon Limited Error-correcting Coding and 
Decoding: Turbo-codes (1)," in Proc. ICC'93, May, 1993). The encoding structure 
comprises two parallel recursive systematic convolution encoder (hereafter abbreviated 
as RSC). The important characteristics are (1) Two convolution codes with the same 
structure encode in parallel, thus the receiving end is able to decode the message 
repeatedly; (2) To increase the minimum distance between two encoding codes by using 
the non-uniform random interleaving (S. Benedetto and G. Montorsi: "Role of Recursive 
Convolutional Codes in Turbo Codes," Electron. Lett., Vol.31, No.U, pp. 858-859, 
1995); and (3) Soft-in Soft-out decoding. 

[0039] Because the characteristics mentioned above, the capability of the error- 
correcting appears equal and excellent. Due to the excellence of the error-correcting 
capability, the turbo-code is widely applied in the general communication system such as 
the CDMA transmission system (J. Blaanz, P. Jung, and M. Na B han, "Realistic 
Simulations of CDMA Mobile Radio Systems Using Joint Detection and Coherent 
Receiver Antenna Diversity," IEEE third International Symposium on Spread Spectrum 
Techniques and Apphcations, Oulu Finland, 1994). 

[0040] Referring to FIG. 1, it schematically shows a turbo-code encoder comprising of 

two parallel RSC encoders. The input bit sequence is represented as i/=(i/;, t/^, d^, , 

, ... , dj^) , where d^ is the input bit of the encoder at time k, k is from 1 to N, N is the 
block size. The output of the encoder at time k is represented as Cf,={Xk,y ,k^y2k)' Since 
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the encoder is systematic, so = d^, the surplus code output is represented as j/t.jV^*- 
The decoding structure of the turbo-code is shown in FIG. 2. the decoder 200 comprises 
two recursive decoding units 210 and 220; two recursive decoding units 210 and 220 are 
connected in interleaving and deinterleaving unit as shown as the 212, 214 and 216 in the 
5 diagram. 

[0041] It is assuming tiiat the Gaussian noise is the noise used in the communication 
channel. It is further assimiing that the noise of each transmission symbol is an 
independent noise, the expectation value is 0, and the variant is No/2 . Using the binary 
modulation, if the input bit is 0, the modulation is -1.0; if Hie input bit d^is 1, the 
10 modulation is +1.0. Therefore, the sequence of the receiving vector R is represented as 

jR=(r^, r^, rj. , . r^) , the kth symbol is represented as 

r, =(rj,_„ r,^,„ r,^,,)- (2x,-l+n;,.„ 2y;,-l+«,p,t,2y,,-l+«^p, J 
[0042] Wherein, k n^^.i, and n^^.i is the noise of the channel r,,, rjp, at time k 
respectively, and they are independent each other. The detail of the Maximum A 
15 Posteriori (hereafter abbreviated as MAP) algorithm proposed by BCJR (L. Bahl, J. 
Cocke, F. Jelinek, and J. Raviv, "Optimal Decoding of Linear Codes for Minimizing 
Symbol Error Rate," IEEE Tran. I. T., Vol.20, pp.284-287, March 1974) is not 
superfluously described here. Herein, only describe the result of the MAP algorithm. 
The objective of the MAP algorithm is to calculate whether the A Posterioi Probability 
20 (hereafter abbreviated as APP) of each input bit d^, is the ratio of 1 or 0. Wherein, k=0, 1, 
2, .... N-1. From the derivation result of the turbo-code having the error-correcting 
capability nears to the Shannon limited error-correcting proposed by Berrou, Glavieux 
and Thitimajshima mentioned above, the following equation is obtained: 
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A(d,) = log " ^ ( 1 ) 

Wherein, is the forward recursive probability parameter, is the backward 
recursive probability parameter, 7^'^ is the branch probability parameter. As we can see 
from the name, the forward recursive probability parameter can be obtained from the 
5 calculation of the previous parameter a^_i and the branch probability parameter /^'^ , the 
equation is as follows: 

1 

a,(m)= '-\ (2) 

The backward recursive probability parameter /?j can be obtained from the calculation of 
the next parameter and the branch probability parameter yl'}^ , the equation is as 
10 follows: 

P,{m)= (3) 

m m' J=0 

The branch probability parameter yf is obtained from following equation according to 
the MAP algorithm: 

y]^\m\m) = piy,^ , \ d, = Us^ = m,s^_^ = w') • ^ | d, = i,s, = m,s^_, = m') • 
q{d^ -i\si^ = m,Si^_i = m') • Pr{s^ = m \ = m'} 

15 Wherein, whether the probability parameter q(di^ - m,s^_i - m') is 0 or 1 depends 

on the input bit d/^ =i is 0 or 1 combines the probability of the state m' to the state m. 
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[0043] In a sequential calculation decoder, it is assuming that each Md^^ ) in equation (1) 
needs a unit of time, wherein, K is from 0 to N+M-1, N stands for the block length of the 
transmission, and M stands for the register size of the decoder. It is further assuming that 
a /S and 7^'^ in equation (2), (3), and (4) needs a unit of time respectively, wherein, 
i=0 or 1. Therefore, the first level decoder needs 5*(N+M) units of time. According to 
the decoding algorithm such as the Viterbi algorithm (A.J. Viterbi, "Error Boxmd for 
Convolutional Codes and an Asymptotically Optimvmi Decoding Algorithm," IEEE 
Trans. Inform. Theorem, vol.IT-13, pp.260-269 Apr. 1967)(A.J. Viterbi and J.K. Omura, 
"Principles of digital communication and coding," New York: MacGraw-Hill, 1979) or 
the BCJR algorithm mentioned above, if N is too small, the error-correcting capability is 
not good. However, if N is too big, for a communication system needs the real time 
processing, the decoding delay is too big to tolerant. 

[0044] As mentioned in the previous paragraph, currently the decoding algorithm is used 
to decide the value of A{di^) in equation (1), if K{d^)>0, d^=^\, otherwise, df^=0. To 
calculate each K{d^) in equation (1), thea k, /3 i,, and yfin equation (2), (3), and (4) 
must be calculated first. For a sequential calculation decoder, it needs 5*(N+M) units of 
time (CMasera, G. Piccinini, M.R. Roch, nad M. Zqmboni, "VLSI Architectures for 
Turbo Codes," IEEE Trans. On VLSI Systems, vol.7, no.3, pp. 369-379, Sep. 1999). 
[0045] In order to increase the calculation speed and thus to increase the throughput. A 
preferred embodiment of the present invention adopts the parallel and systolic array VLSI 
structure design. The whole decoder circuit is composed of P levels parallel decoding 
vmits. The structure is shown in FIG. 3. There is a serial in parallel out unit before the 
first level to receive the message r,, ,,, r^^ ^, and r^^ t» wherein, the subscript 
K=0,1,...,N+M-1 represents the whole block and end message. The output is sent to 
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the first level decoding unit, the other input of the first level decoding unit is L ^ j,, herein, 

the is the parameter obtained via the deinterleaving on the previous level extrinsic 
parameter A(d^), the initial value of the 0* level decoding unit extrinsic parameter is set 
asL^(,A^(0,0...,0). The first level extrinsic parameter Z- ^ / ^ is generated via the first 
5 level decoding xmit, and the message r,^i„r,p,„ and r sequentially pass through to be 
the input of next level, 

[0046] Each level of the decoding unit comprises two decoders. These two decoders are 
the first decoder and the second decoder as shown in FIG. 4, wherein, the structure of the 
first decoder is similar to the second decoder's. The whole systolic array VLSI structure 

10 is shown in FIG. 5. Wherein, N and M can be adjusted according to the design 
requirement. For easy to describe, the block length N=4 and register size M=3 are used as 
an example. FIG. 6 schematically shows the structure of the simplified modules, data 
streams, and the latches. It is apparent for those who skilled in the art that even the 
embodiment is used as an example in the present invention, the embodiment will not limit 

15 the apply range of the present invention. 

[0047] According to the literature (I.L. Turner, "A Modified BAHL Algorithm for 
Recursive System Convolutional Codes on Rayleigh Fading Channels," IEEE 49th 
Vehicular Technology Conference, pp.75-76 vol.1, 1999), the apriori probability of the 
input bit dk calculated by the previous level decoder is represented as 

20 Pr{Sk=m|s,.i=m} = j-^p;(5;^ ,ifq(dk=l|Sk=m,Sk.i=m')-l (5) 
Pr{s, =m|s,., =m}= ^^^t^,^) = l+e^^'^^> ' if q(d,=0|s,=m,s,.i=m')=l (6) 
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Wherein, L(dk) is the log likelihood ratio (LLR) extrinsic parameter calculated from the 
message bit di, by the previous level decoder. It is assumed in a AWGN channel, well 
than, the partial probability of the equation (4) is calculated as follows: 



P(rip,k I ^k = h^k = = m') -- 



1 



-exp 



(7) 



(8) 



Wherein, //,,^ and ^^^p{m\m) is the expectation value of r,^ and r,p respectively. 
Thereinto, //^,^ depends on the input bit, and fi^^p{m\m) depends on the input bit and 
also impacted by the previous state and current state, o-^^ and ct^^ is the variant of the r 
andr/p respectively. It is assumed that the variant of r^, andr^p are the same. Therefore, 
the above two equations can be multiplied and consolidated as follows: 



-^exp 



(9) 



For a discrete memory-less gauss channel, the branch probability parameter yl or yl for 
input bit is 1 or 0 can be calculated from the equation (4), (5), (6), and (9) as follows: 



yl\m',m) = j—^exp 



2n:a 



2 ' a' 

^ns,k+^f+inp,k-l^rxp{m\m)f 



1 



(10) 



(11) 



[0048] According to the equation (10) and (11), the branch probability parameter 
yl'\m\m) can be calculated in parallel. The N+M units of the module C (as shown in 
FIG. 7) are used to calculate each yl'\m\m) in parallel. Thus, the N+M units of time 
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can be shortened to a vmit of time. The input signal of the module C in FIG. 7 is k, 

r , 5 k and r i p respectively, wherein, k=l,...,N+M. The module C is used to calculate 

/l^\m',m) and yf^{m\m) respectively. 

[0049] In addition, since the forward recursive probability parameter a ^ is output from 
5 the previous level to be the input of the next level, and the backward recursive probability 
parameter ^ ^ is output from the next level to be the input of the previous level. It is 
suitable to design as the systolic array VLSI to increase the calculation speed. According 
to the equation (2), N+M units of Module A (as shown in FIG. 8) are used to calculate a 
J, . Wherein, the first level input is y*^^ (m' ,m) and rl°^ {m\m) and the initial value of the 
10 forward recursive probability parameter a o(ni) are used to calculate a i(m). The 
second level input 7^'^(m',m) and y[^\m' ,m) and a i(m) are used to calculate a 2(m). 
Thus, the systolic array is able to work simultaneously. All a k(m), wherein 
k= 1 . . . ,N-M , can be calculated after N+M units of time. 

[0050] According to the equation (3), it adopts N+M units of Module B (as shown in FIG. 

15 9) for calculating 0 ^. Wherein, the first level input is /^+;,f(m',m) and (w' , m) 
and the initial value of the backward recursive probability parameter j8 n+m(i^) 
used to calculate y3 j^^M-i(na)- The inputs of the second level y^lM-ii^' ^"^) and 
/nIm-iC'"' , and the backward recursive probability parameter /S ^^.^j. j(m) are used 
to calculate 0 n+u-ii^)- The advantage is the structure of each module is the same; the 

20 ou^ut of the previous level is the input of the next level. Thus, the throughput is (N+M) 
times of the original throughput. 
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[0051] When the calculation of a t, yS ^ and are completed, according to the 
equation (1), it adopts N+M units of module D (as the module D shown in FIG. 10) to 
calculate A{d^). By using the parallel calculation, the N+M xmits of time is shortened to 

a unit of time. 

5 [0052] The submodule L located in between the module A and the module B calculates 
the product-sum of two inputs. As the example shown in FIG. 11, the submodule L 
adopts the analog circuit provided by the conventional technique. The analog circuits 
proposed by the reference literatures also can be used. Like H.-A.Loeliger, F. 
Lustenberger, F. Tarkoy, M. Helfensten, "Decoding in Analog VLSI," IEEE 

10 Communication Magzine, VoL37 (4), pp.99-101 Apr. 1999, or H.-A.Loeliger, F. 
Lustenberger, M. Helfensten, F. Tarkoy, "Probability Propagation and Decoding in 
Analog VLSI," IEEE Trans.on Information Theory, Vol.47(2), pp.837-843 Feb. 2001, or 
F. Lustenberger, M. Helfenstein, H,-A, Loeliger, F. Tarkoy, G.S. Moschytz, "An Analog 
VLSI Decoding Technique for Digital Codes," ISCAS '99. Proceedings of the 1999 

15 IEEE international Symposium on Circuits and Systems, Vol 2, pp.424-427 1999,. . etc. 
[0053] For easy to describe the detail structure of the module A, B, and D mentioned 
above, the preferred embodiment of the present invention uses the turbo-code of the third 
generation CDMA mobile communication standard as an example for description. 
However, it is not used to limit the apply range of the present invention. The turbo-code 

20 of the third generation CDMA mobile communication standard is: a decoder register size 
M=3. For the first decoder and the second decoder, the code ratio R=l/3, the parameter of 
the feedback generator and the parameter of the dnrect-feed-forward generator is 
Gj=1011 and Grf=1110respectively. As shown in FIG. 12, the recursive systematic 
convolution encoder (hereafter abbreviated as RSC), wherein, the RSC adopts the fast 
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RSC encoder, for the physical content of the fast RSC encoder, please refer to the "Fast 
Turbo-code Encoder" proposed by the same inventor of the present invention in April, 
2001. The trellis diagram is shown in FIG. 13. 

[0054] Referring to the content of FIG. 6, FIG. 6 schematically shows the structure of the 
5 simplified modules, data streams, and the latches when the block length N=4 and the 
register size M=3 . There are N+M=7 units of the module A, B, C, and D. In the first unit 
of time, the parallel input j,, ri^j, and rip^, signals, k= 1,2,..., 6,7 are used 
simultaneously to calculate the y\'^ , y^^^ , . . ., y^'^ . In the 7 units of time afterwards, the 
a X, a^,"-, CL t and^S i, yS 2, /S g is calculated respectively. In the other one unit of 
10 time afterwards, according to the equation (2), the parallel input Yf^{m\m) , yf\m\m) , 
a ij.i and ^ ^.^ are used to calculate A(d^). The A(d^) is used as the extrinsic 
parameter of the next level, if the last level is reached, the d^ is determined accordingly, if 
dk>0, determine d^ =1, otherwise dk=0. 

[0055] According to the trellis diagram of FIG. 13. It is easy to simplify the structure of 
15 the module A, B, and D. FIG. 14 schematically shows the detail structure of the module 
A based on this design. The detail structure of the module B is also similar to the module 
A. The detail structure of the module D is shown in FIG. 15. 

[0056] The latency spent for accomplishing a message with one block size length of the 
parallel and systolic array VLSI structure design of the preferred embodiment according 
20 to the present invention, as shown in FIG. 1 6, is N+M+2 units of time. Comparing to the 
original conventional sequential calculation structure that needs 5*(N+M) units of time, 
the time is shortened to about 1/5 only. Furthermore, the systolic array VLSI structure 
design is able to generate a set of d,j in every one imit of time after the first set of d^ is 
generated. 
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The performance comparison is shown in table 1: 

Table 1 : The structure comparison of the systolic array and the sequential type 



Item/Structure 


Sequential Structure 


Systolic Array 
Structure 


Pro and Con 


Latency 


5*(N+M) 


(N+M)+2 


The latency is about 

1/5 


Output Time 






The throughput is 
about 5*(N+M) 
times 


Number of Hardware 
Gate 


1 


5*(N+M) 


The complexity of 
the circuit is about 
5*(N+M) times 



5 [0057] In order to prove the error-correcting feature of the preferred embodiment 
according to the present invention. Herein, the CDMA mobile communication system 
mentioned above is used as an example. The RSC decoder with register size M=3 is 
shown in FIG. 12, The trellis diagram is shown in FIG. 13. The iterative decoding 
number P=6. The random interleaving method is adopted in between the first decoder 

10 and the second decoder. The simulation result is obtained as shown in FIG. 17, wherein, 
the block length N=65536, the vertical axis is the decoding performance denoted by the 
bit error rate (BER). The horizontal axis is the communication environment denoted by 
the signaL/noise ratio. As we can see here, under the situation with the same signal/noise 
ratio, the larger the iterative decoding number, the better the decoding performance. This 
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is accorded with the theory, and is similar to the simulation result disclosed in the 

contents of the literatures: C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon 
Limited Error-correcting Coding and Decoding: Turbo-codes (1)," in Proc. ICC'93, May, 
1993, and P.Robertson "Illuminating the Structure of Code and Decoder of Parallel 
5 Concatenated Recursive Systmatic (Turbo) Codes," in Proc. IEEE GLOBECOM Conf., 
San Francisco, CA. Pp. 1298-1303, Dec.1994. 

[0058] The present simulation uses the programming language C language running on 
the Genuinelnter Pentium® III CPU, 128 MB RAM personal computer. The simulation 
runs on the working platform with the Windows Me® operating system. The bit error 

10 rate comparison shown in FIG. 17, wherein, the iterative decoding number (p=l,...,6), 
the code ratio R=l/3, the register size M=3, the generator parameter Gb=1011, Gd=1110, 
and uses the 256*256 random interleaving deinterleaving method. 
[0059] The present uivention provides a fast turbo-code decoder. Wherein, the decoder 
is designed to use the systolic array VLSI circuits. Since the output of previous level can 

15 be used as the input of next level. So the advantages of the parallel and the pipeline 
calculation are totally achieved. The latency is only N+M+2 units of time, the latency is 
shorten to as about 1/5 comparing to the conventional sequential calculation structure that 
takes 5*(N+M) units of time. The decoding throughput is about 5*(N+M) times higher 
than the conventional decoder. Although the quantity of the circuit gate is about 5*(N+M) 

20 times higher than the conventional decoder. However, the VLSI techniques had been 
progressively improved nowadays, thus the hardware complexity is easy to overcome. 
Devoting the hardware cost to get the higher speed will be a changeless trend. 
[0060] Although the invention has been described with reference to a particular 
embodiment thereof, it will be apparent to one of the ordinary skill in the art that 

18 



FILE: 8481USF.RTF 



modifications to the described embodiment may be made without departing from the 
spirit of the invention. Accordingly, the scope of the invention will be defined by the 
attached claims not by the above detailed description. 



19 



